Improving Software Pipelining with Unroll-and-Jam

نویسندگان

Steve Carr

Chen Ding

Philip H. Sweany

چکیده

To take advantage of recent architectural improvements in micropr&essors, advanced compiler optimizations such as software pipelining have been developed [I, 2, 3, 41. Unfortunately, not all loops have enough parallelism in the innermost loop body to take advantage of all of the resources a machine provides. Unroll-and-jam is a transformation that can be used to increase the amount of parallelism in the innermost loop body by making better use of resources and limiting the effects of recurrences (5, 61. In this paper, we demonstrate how unroll-and-jam can significantly improve the initiation interval in a software-pipelined loop. Improvements in the initiation interval of greater than 40% are common, while dramatic improvements of a factor of 5 are possible.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Register Pressure Guided Unroll-and-Jam

Unroll-and-jam is an effective loop optimization that not only improves cache locality and instruction level parallelism (ILP) but also benefits other loop optimizations such as scalar replacement. However, unroll-and-jam increases register pressure, potentially resulting in performance degradation when the increase in register pressure causes register spilling. In this paper, we present a low ...

متن کامل

Unroll-And-Jam Guided by A Linear-Algebra-Based Data-Reuse Model

Because of the existence of a memory bottleneck in modern microprocessors, idle computational cycles in pipelined multiple functional units slow down the program performance. One solution to this problem is applying loop unroll-and-jam to improve the ratio of memory operations to floating-point operations for loops according to the target machine optimal ratio. In doing so, both enough computat...

متن کامل

Optimizing Sparse Matrix - Vector Product Computations Using Unroll and Jam

Large-scale scientific applications frequently compute sparse matrix vector products in their computational core. For this reason, techniques for computing sparse matrix vector products efficiently on modern architectures are important. This paper describes a strategy for improving the performance of sparse matrix vector product computations using a loop transformation known as unroll-and-jam. ...

متن کامل

Source-to-Source Transformations for Efficient SIMD Code Generation

In the last years, there has been much effort in commercial compilers to generate efficient SIMD instructions-based code sequences from conventional sequential programs. However, the small numbers of compilers that can automatically use these instructions achieve in most cases unsatisfactory results. Therefore, the code often has to be written manually in assembly language or using compiler bui...

متن کامل

A Model for Hardware Realization of Kernel Loops

Hardware realization of kernel loops holds the promise of accelerating the overall application performance and is therefore an important part of the synthesis process. In this paper, we consider two important loop optimization techniques, namely loop unrolling and software pipelining that can impact the performance and cost of the synthesized hardware. We propose a novel model that accounts for...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 1996

Improving Software Pipelining with Unroll-and-Jam

نویسندگان

چکیده

منابع مشابه

Register Pressure Guided Unroll-and-Jam

Unroll-And-Jam Guided by A Linear-Algebra-Based Data-Reuse Model

Optimizing Sparse Matrix - Vector Product Computations Using Unroll and Jam

Source-to-Source Transformations for Efficient SIMD Code Generation

A Model for Hardware Realization of Kernel Loops

عنوان ژورنال:

اشتراک گذاری